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The introduction of metagenomics into the field of virology has facilitated the exploration of viral communities in various 
natural habitats. Understanding the viral ecology of a variety of sample types throughout the biosphere is important perse, 
but it also has potential applications in clinical and diagnostic virology. However, the procedures used by viral metagenomics 
may produce technical errors, such as amplification bias, while public viral databases are very limited, which may hamper the 
determination of the viral diversity in samples. This review considers the current state of viral metagenomics, based on 
examples from Korean viral metagenomic studies- i.e., rice paddy soil, fermented foods, human gut, seawater, and the 
near-surface atmosphere. Viral metagenomics has become widespread due to various methodological developments, and 
much attention has been focused on studies that consider the intrinsic role of viruses that interact with their hosts. 
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Introduction 

Viruses are the most abundant and diverse biological 
entities in the biosphere, and their global numbers have been 
estimated at 1.2 x 10^° and 2.6 x 10^° in the ocean and soil, 
respectively [1]. Thus, viruses are key elements that 
contribute to the life cycles of cellular organisms [2]. 
However, the study of viral ecology in natural habitats has 
been limited due to the difficulties of viral culture [3]. The 
lack of a ubiquitous marker gene, such as the 1 6S rRNA gene 
shared by all bacteria and archaea, also hampered our under- 
standing of the genetic diversity of viruses, prior to the intro- 
duction of metagenomics into the field of viral ecology [1,4]. 
Recently, viral metagenomics has enabled researchers to 
explore the community structure and diversity of viruses in 
various natural ecosystems [5] . This methodology depends 
on a priori knowledge of the viral types that may be present 
[6, 7]. 

The first viral metagenome of uncultured marine viral 
communities was published in 2002 [8], and there have been 
many subsequent advances in the methodologies (e.g., 
methods for amplifying the initial viral genomes) and tools 
used for bioinformatics analysis in viral metagenomics. 



These technical developments have facilitated explorations 
of the abundance and diversity of viruses firom a wide range 
of natural habitats in Korea. In Korea, viral metagenomics 
was applied for the first time to unique samples, including 
fermented foods and atmospheric samples, as well as ha- 
bitats where viruses are expected to have significant roles, 
such as rice paddy soil, seawater, and the human gut. This 
review describes recent advances in viral metagenomics and 
provides summaries of studies that have been conducted to 
characterize Korean viral metagenomes. In addition, the 
advantages and disadvantages of the most widely used viral 
DNA amplification methods are discussed, based on 
empirical knowledge. Further directions for the study of 
virus-host interactions are also highlighted. 

Approaches to Viral Ecology Using Meta- 
genomics 

Viral metagenomics is the study of viral metagenomes 
(known as viromes), which are obtained directly firom 
environmental samples using viral particle purification and 
shotgun sequencing. Viral metagenomic studies have in- 
aeased gradually since the expansion of metagenomic 
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Fig. 1. Overview of viral metagenomic 
studies conducted between 2002 and 
2013. The cumulative number of publi- 
cations determined by PubMed searches 
using the keywords '(virus OR viral OR 
virome) AND (metagenome OR meta- 
genomics)' is shown on the y-axis. The 
first viral metagenome and Korean 
metagenomes (highlighted by arrows), 
as well as the technical development of 
high-throughput sequencing platforms 
(below the highlighted arrow), are 
shown with the year of publication 
and/or public release. 



approaches to viral ecology; i.e., over 200 (61 reviews) 
investigations of the viral communities in environmental 
samples have been published (Fig. 1) [8-13]. In 2002, 
Breitbart et al. [8] used shotgun library sequencing and 
reported that the majority of the sequences in marine viral 
metagenomes shared no similarity with any genes in public 
databases, which suggested that most environmental viruses 
remained uncharacterized. This was the first study using 
viral metagenomic approaches in the field. Subsequently, 
many studies have surveyed the viral diversity of unexplored 
habitats using viral metagenomics. 

Cultivation in a host is usually necessary to obtain a virus 
from the environment that is being investigated. However, 
the application of metagenomics techniques based on 
genetic information can circumvent this obstacle [14]. Based 
on the physical charaaeristics of virions, viral particles can 
be isolated from environmental samples, which are enriched 
using a combination of size filtration (e.g., <0.22 ;Cim)and 
density gradient centrifiigation (e.g., 1.35-1.5 g/mL cesium 
chloride) [14], so that the virome can be obtained from the 
purified viral particles. There is a lack of evolutionarily 
conserved genes, such as the prokaryotic 16S ribosomal 
RNA gene, in viral genomes. Therefore, the firagmented viral 
metagenomic sequences obtained by whole-genome shot- 
gun DNA sequencing are used to analyze the viral ecology 
instead. Viral genomes are small (on average, they comprise 
a few to several dozen kilobases); so, valuable genome 
coverage can be achieved easily by DNA sequencing. The 
advent of high-throughput sequencing techniques, such as 
454 pyrosequendng and Alumina sequencing, makes it 



easier to achieve suitable genome coverage at low cost. 
Indeed, over 90 of either nearly complete or complete novel 
viral genomes were assembled using these methods in recent 
studies [9, 10, 15-20]. However, it is inevitable that an am- 
plification step is necessary for small viral genomes prior to 
DNA sequencing. Linker amplified shotgun library (LASL) 
and multiple displacement amplification (MDA) are the 
main methods used to amplify viromes in viral metage- 
nomics. The LASL method, developed by Breitbart et al. [8], 
PCR-amplifies a virome after adapter attachment to 
randomly fragmented viral DNAs. The application of adapter 
attachment is restricted to double-stranded DNA sequences; 
so, only double-stranded DNA (dsDNA) viruses can be 
detected using the LASL method. Thus, the dominance of 
dsDNA tailed bacteriophages was reported initially in 
uncultured marine viral assemblages [8], as well as other 
environmental samples, such as kimchi [11], aquatic water 
[8], and feces [21] . The MDA technique [22] amplifies DNA 
isothermally using the phi29 polymerase and random 
hexaprimers and has been used before in microbiology [22] . 
The high amplification efficiency of this method means that 
it is suitable for amplification during viral metagenomics 
applications. Particularly, the phi29 polymerase of the MDA 
technique selectively amplifies circular genomes (estimated 
at 100 times) [10], and it has facilitated the discovery of 
abundant single-stranded DNA and RNA viruses in 
environmental samples [23, 24] (described in detail below). 
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Assignment of Viral Sequences Based on 
Their Metadata and Potential Hosts 

The analysis of viral metagenome data using bioinfor- 
matics is one of the most challenging aspects of viral 
metagenomics. One million to 1 billion reads of viral meta- 
genomic sequences are typically generated by high- 
throughput sequencing platforms (with an average read 
length of 350-400 bp using 454 GS-FLX Titanium and 2 x 
150 bp using lUumina HiSeq 2500). After removing any 
low-quality, redundant, and chimeric sequences, the viral 
sequences are compared to sequences in public databases 
(e.g., the GenBank non-redundant nucleotide database, 
MG-RAST, and CAMERA) using BLAST [25] or USEARCH 
[26]. The identification of viral sequences based on 
significant amino add similarity (E-value of < 10"^) was first 
described by Breitbart et al. [8], and it has since been 
extended to the exploration of environmental viromes, 
although the £-score applied to viral metagenomic studies 
appears to be regarded as "a loose standard" [27]. 

Most of the environmental viromes detected by viral 
metagenomics are defined as orphan (unassigned) se- 
quences. The majority of viral sequences shares no amino 
acid similarity with previously observed genes (average 40% 
to 50%, occasionally up to 90%, of sequences); so, they are 
characterized as "unknown" [5]. Comparisons of viral 
sequences with the data in public databases have demon- 
strated that little is known about environmental viruses. 
Thus, the majority of the unassigned sequences in viral 
metagenomes is often regarded as "junk sequences" due to a 
lack of suitable bioinformatics tools and viral databases for 
their characterization [ 1 , 28] . At present, the viral databases 
are biased toward animal and plant viruses, although viruses 



that infect prokaryotes (bacteriophages) are sparsely re- 
presented. Most of the latter are restricted to phages that 
infect bacteria belonging to the phyla Proteobacteria, 
Firmicutes, and Actinobacteria [29, 30]. Moreover, even 
"known" viral sequences share low amino acid similarities 
(<50%) with viral protein sequences [9, 11, 13, 31]; so, the 
majority of environmental viruses representing novel viral 
species and their viral diversity is much greater than con- 
sidered previously. The observation of a high percentage of 
ORFans (open reading fi-ames with no homologs in known 
genes in the databases) in viral genomes [32] also supports 
the novelty of environmental viromes. Thus, researchers 
could discover novel viruses in orphan sequence pools that 
currently remain "untapped resources." These findings 
indicate our current lack of knowledge about viral genetic 
information and emphasize the need for physiological 
studies of viruses to understand viral ecology based on 
genomic data. 

Viral Metagenomes from Korean Samples 

Using viral metagenomic approaches, viral diversity and 
abundance have been investigated in various natural 
ecosystems in Korea, including rice paddy soil, fermented 
foods, the human gut, seawater, and the near-surface 
atmosphere (Table 1). The morphologies of environmental 
viruses have been imaged using transmission electron 
microscopy (Fig. 2) . Sipho- and podo-like tailed viruses were 
found in fermented foods (Fig. 2A-2C), while non- tailed 
small viruses were detected in the near-surface atmosphere 
(Fig. 2G-2I). Various types of tailed, non-tailed, circular, and 
long linear viruses were found in the human gut (Fig. 
2D-2F). In general, the virions ranged in size firom 30 to 60 



Table 1. Published Korean viral metagenomics 

Sample type Rice paddy soil Fermented foods Seawater surface Human feces Atmosphere 

Location Daejeon, Korea Korea The western sea, Korea Seoul, Korea Seoul and 

Gyeonggi-do, Korea 

Sequencing Sanger sequencing GS FLX Titanium GS FLX GS FLX Titanium GS FLX Titanium 

information 

Amplification method MDA LASL LASL and MDA MDA MDA 

Target dsDNA and ssDNA dsDNA viruses dsDNA and ssDNA ssDNA viruses ssDNA viruses 

viruses viruses 

Similarity threshold £-value < 10"^ £-value < 10"^ £-value < 10"^ E-value < 10"^ to f-value < 10"^ 

..M^Sfe) (BLASTx) -..JiyS.Tx) -.iBt^x) (BLASTx) 

Not assigned (%) 64-67 37-50 90-94 73-94 50-80 

Accession no. ABQXOl 000001 -ABQX SRP002583 (SRA) 4464802.3, 4464804.3 SRP005097 (SRA) and SRP007810.1 (SRA) 

01000878 (GenBank) and 4464805.3 4449580.3-4449584. 

(MG-RAST) 3 (SEED) 

MDA, multiple displacement amplification; LASL, linker amplified shotgun libraries; dsDNA, double-stranded DNA; ssDNA, single- 
stranded DNA. 
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Fig. 3. Comparison of the compositions 
of viral assemblages identified in Korean 
environments. The percentages of viral 
families in each virome indicate the 
assignments of the metagenomic se- 
quences to viral families. The ampli- 
fication methods used in each study are 
designated as follows: M, multiple dis- 
placement amplification; L, linker-am- 
plified shotgun library. 



nm. In agreement with the results of previous viral 
metagenomic studies [23, 31], over half of the sequences in 
the Korean environmental viromes were described as orphan 
sequences, based on comparisons with viral proteins in 
public databases (Table 1). Most of the sequences identified 
were assigned to the Siphoviridae, Podoviridae, and Myoviridae 



families of dsDNA viruses and the Circoviridae, Germi- 
niviridae, Nanoviridae, and Microviridae families of ssDNA 
viruses (Fig. 3) . The first viral metagenomic study in Korea 
surveyed uncultured viral assemblages in rice paddy soil in 
2008 [10], where MDA was used with phi29 DNA poly 
merase and random hexaprimers to amplify viral DNA and to 
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construct clone libraries for metagenome sequencing. The 
soil was found to contain a rich pool of unknown ssDNA 
viruses and dsDNA viruses. This study also focused on the 
effect of MDA amplification on different types of genomic 
DNA and showed that MDA preferentially amplified circular 
DNA genomes. This was also demonstrated using an 
environmental sample from surface seawater [12], where 
dsDNA viruses alone were retrieved in the LASL library, 
whereas ssDNA viruses were overwhelmingly represented 
in the MDA library. Thus, the amplification methods used in 
viral metagenomics can affect the ratios of viral sequences 
greatly and lead to inaccurate estimates of viral diversity. 

Next, Park et al. [11] investigated the abundance and 
diversity of uncultured viral assemblages in fermented 
shrimp, kimchi, and sauerkraut— fermented foods that have 
been consumed for a long time around the world. In contrast 
to the soil virome, dsDNA bacteriophages from the families 
Myoviridae, Podoviridae, and Siphoviridae dominated the 
fermented foods, and they contained a low complexity of 
viral assemblages compared with other environmental 
habitats, such as seawater, human feces, marine sediment, 
and soil. However, it is possible that the viral diversity of the 
viromes detected in fermented foods may have been 
constrained to dsDNA viruses by the LASL method. 

A large number of unknown microbes, such as bacteria, 
archaea, microbial eukarya, and viruses, constitute up to 10^^ 
bacteria per gram of feces in the human gastrointestinal tract 
[33], and it is expected that gut viruses will affect the 
relationships among viruses, bacteria, and gut epithelial 
cells [34]. Kim et al. [13] investigated the abundance and 
diversity of DNA viruses in fecal samples from five healthy 
Koreans, particularly ssDNA viruses. Using epifluorescence 
microscopy with SYBR Gold staining [35], the viral 
abundance ranged from 10* to lO' per gram of feces, which 
was 10-fold less than the bacterial abundance in many other 
environments that harbored 10-100-fold more viruses (e.g., 
aquatic environments). Moreover, the diversity of gut viral 
assemblages was lower than that of gut bacteria. These 
results support Reyes et al. [29], who found that viral- 
microbial interactions in the human intestine could not be 
desaibed as a predator-prey relationship, and instead, it was 
referred to as "kill the winner," which was driven by a lytic 
life cycle. 

Airborne viruses are now regarded as major environ- 
mental risk factors for complex disease pathogenesis [36- 
38]. However, the atmosphere remains "one of the last 
frontiers of biological exploration on Earth" [9]. Using viral 
metagenomics with an advanced airborne particle sampling 
system, Whon et al. [9] conducted the first study of the 
diversity and community composition of airborne viruses in 
the near-surface atmosphere. The viral abundance in the 



atmosphere exhibited seasonal changes (increasing from 
autumn to vwnter before decreasing until spring) in the 
range of 10^ to 10"" viruses per m^, and the temporal 
variations in viral abundance were inversely correlated with 
seasonal changes in temperature and absolute humidity. 
Plant-associated ssDNA geminivirus-related viruses and 
animal-infecting circoviruses dominated the viral assem- 
blages, with low numbers of nanoviruses and miaophages in 
air viromes, which suggests that airborne viral assemblages 
are afFeaed gready by terrestrial plants and animal activities. 

The compositions of viral assemblages are determined by 
how the virome is amplified. Thus, the compositions of the 
viral assemblages detected in fermented foods and marine 
samples are biased toward dsDNA viruses, such as sipho-, 
podo-, and myophages, when the LASL method is used, 
whereas the viral assemblages detected in rice paddy soil, 
human gut, marine, and near-surface atmosphere samples 
contain high proportions of ssDNA viral sequences, due to 
the use of MDA, as shown in Fig. 3. In addition, the 
compositions of the viral assemblages characterized in 
Korean environments tend to depend on their specific 
microbial features. The human gastrointestinal tract and 
fermented foods are exposed to massive numbers of gut 
bacteria and lactic acid bacteria, respectively [39-41]. By 
contrast, the atmosphere contains far less cellular meta- 
bolism and reproductive activity than other environments, 
such as the soil, seawater, fermented foods, and the human 
gut [42]. On this basis, the lowest abundance of eukaryotic 
viruses was observed in the viral assemblages in the human 
gut and fermented foods, whereas a high abundance of 
eukaryotic viruses was detected in the near-surface atmo- 
sphere. High levels of prokaryote and eukaryote cells are 
present in rice paddy soil [43] and seawater [44], and so, 
comparable amounts of bacteriophages and eukaryotic 
viruses were deteaed in their viromes. 

Discovery of Single-Stranded DNA Viruses 
in Korean Environments 

The development of viral metagenomics has facilitated the 
discovery of novel, previously undescribed viral species. An 
artifact of the MDA method is that it selectively amplifies the 
circular genomes of ssDNA viruses, so that a large number of 
ssDNA viral sequences have been identified in environ- 
mental viromes. Thus, there is great interest in the dis- 
tribution and host range of ssDNA viruses. In particular, 
microphages from the family Microviridae have been identi- 
fied in a wide range of environments [ 1 3, 45, 46] . In contrast 
to the ecology of marine dsDNA phages, marine ssDNA 
phages in the family Microviridae have distinct spatial and 
temporal distributions [45]. ssDNA miaophages were 
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abundant in the healthy human gut and their genotypes were 
much more diverse than those reported previously [13]. 
Moreover, prophage-like elements in the genomes of gut 
microbes, such as Bacteroides and Prevotella spp., were 
characterized as a novel subgroup in the family Microviridae 
[13, 47], while viral sequences from the human gut were 
clustered with prophage-like elements from Bacteroides and 
Prevotella spp. [13]. This suggests that Bacteroides and 
Prevotella spp. are included within the host range of the 
ssDNA microphages in the human gut [47] . 

Eukaryotic ssDNA viruses that infect plants and 
mammals have been identified in many environmental viro- 
mes. Circoviruses that are known to infect birds and pigs 
have also been identified in the viromes of invertebrates and 
a fish [48, 49], while geminiviruses that cause plant diseases 
have been detected in whiteflies, which act as insect vectors 
of plant viruses [50]. A recent study by Whon et al. [9] 
investigated airbome DNA viral assemblages in near-surface 
atmosphere samples and showed that a high number of 
viruses (log 6 to 7 viruses per m^) were present in the air, 
which were dominated by geminivirus-related viruses. 
These viruses were identified as plant fungal pathogen- 
infecting mycoviruses — i.e., Sclerotinia sclerotiorum hypoviru- 
lence-associated DNA virus 1— which indicates that the 
airbome viral assemblages in the near-surface atmosphere 
may have strong interactions with plants. These results 
highlight the extensive distribution of ssDNA viruses in a 
wide range of environments, and their host ranges may be 
wider than previously recognized. Thus, the discovery of 
novel genomes of ssDNA viral families in metagenomic 
studies could revolutionize our knowledge of the ecology 
and evolution of ssDNA viruses. 

Virus-Host Interactions and Emerging Tech- 
nologies 

In the last decade, viral ecologists have focused on com- 
munity-level analyses of viruses to understand their abun- 
dance and genetic diversity in specific environments. The 
ecological effects of viruses, particularly bacteriophages, are 
known to control host populations via the "kill the winner 
system," while they drive mortality and evolutionary change 
in microorganisms via lateral gene transfer by infecting their 
host bacteria, although the basic issue of "who infects 
whom" is poorly understood [2, 29, 51, 52]. In the ocean, for 
example, viruses regulate the microbial abundance, release 
dissolved organic matter, and affect global biogeochemical 
cycles by killing up to 40% of host bacteria per day [53, 54]. 
In contrast, symbiotic functions of viruses, such as host 
survival, competition, and protection from pathogenic 
infections, are beginning to be understood [55], and evi- 



dence for a beneficial interaction in phage-host interactions 
was found in the mammalian gut ecosystem [29, 56]. When 
host survival is threatened, a variety of environmental 
factors can trigger prophage induction, and the liberated 
prophages may become completely virulent [57]. Overall, 
these studies suggest that prophage induaion may respon- 
sible for triggering dysbiosis and changes in the microbial 
population by altering host phenotypes, thereby leading to a 
new environmental niche. 

Traditionally, host culture-dependent techniques, such as 
plaque assays, have been widely used for the identification of 
phage and host bacteria interactions. However, plaque 
assays require isolated host bacteria; so, they are low- 
throughput methods. This method is also difficult to apply 
to environmental samples where lysogenic infections are 
prevalent, because the method relies on observations of 
visible plaque formations, which are often absent from 
lysogenic infections [3, 58, 59]. Recently, Deng et al. [52] 
demonstrated a new technique, known as "viral tagging," for 
identifying the interactions between cultivated host bacteria 
and their phages, which used the nucleic acid stain SYBR 
Gold to generate fluorescently labeled phages, so that the 
host cells fluoresced with viral tagging, thereby allowing the 
sorting of virus-tagged cells by flow cytometry [52, 60]. This 
emerging technique is undoubtedly helpful for not only 
exploring virus-host interactions in their natural habitats 
when the method is combined with other experimental 
tools, such as single viral genomics [61] and phageFISH 
[62] , but also identifying viral receptors in macro-organisms 
(e.g., the mammalian gut) if the method is combined with a 
fluorescendy labeled receptor protein during histological 
examinations. 

Conclusion 

The emergence of viral metagenomics has facilitated 
advances in virology and allowed us to understand novel 
aspects of viral ecology. At present, viral metagenomics is a 
powerful and sensitive technique for detecting viruses that 
cannot be identified by traditional culture- and sequence- 
based approaches. Most importantly, viral metagenomics 
suggests that novel viruses interact constandy with the 
human population. Thus, viral metagenomics can facilitate 
the improved surveillance of viral pathogens in the fields of 
public health and food security. This technique can be used 
to understand viral ecology by exploring the environmental 
viromes that are generated by viral metagenomics. 
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